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BACKGROUND OF THE INVENTION 
[0005] In many audio applications, an audio signal may 
be modified or processed to achieve a desired 
characteristic or quality. One of the characteristics of 
an audio signal that is frequently processed or modified is 
the speed of the signal. When sounds are recorded, they 
are often recorded at the normal speed and frequency at 
which the source plays or produces the signal. When the 
speed of the signal is modified, however, the frequency 
often changes, which may be noticed in a changed pitch. 
For example, if the voice of a woman is recorded at a 
normal level then played back at a slower rate, the woman's 
voice will resemble that of a man, or a voice at a lower 
frequency. Similarly, if the voice of a man is recorded at 
a normal level then played back at a faster rate, the man's 
voice will resemble that of a woman, or a voice at a higher 
frequency. 

[0006] Some applications may require that an audio 
signal be played at a fast rate, while maintaining the same 
frequency, i.e. keeping the pitch of the sound at the same 
level as when played back at the normal speed. 

[0007] Further limitations and disadvantages of 
conventional and traditional approaches will become apparent 
to one of ordinary skill in the art through comparison of 
such systems with the present invention as set forth in the 
remainder of the present application with reference to the 
drawings . 



2 



BRIEF SUMMARY OF THE INVENTION 

[0008] Aspects of the present invention may be seen in a 
method for speeding up an encoded original audio signal, 
said original audio signal having an original frequency and 
original playback speed. The method being done in a system 
with a machine-readable storage having stored thereon, a 
computer program having at least one code section. The at 
least one code section being executable by a machine for 
causing the machine to perform operations comprising 
receiving the encoded original audio signal; retrieving 
frames of the original audio signal; skipping frames at a 
rate according to a desired playback speed; wherein said 
desired playback speed is greater than the original playback 
speed; applying a window function to the remaining frames; 
converting the signal with the windowed frames from digital 
to analog format; and using the original frequency to 
playback the analog format signal. 

[0009] The system comprises at least one processor 
capable of receiving the encoded original audio signal; 
retrieving frames of the original audio signal; skipping 
frames at a rate according to a desired playback speed; 
applying a window function to the remaining frames; 
converting the signal with windowed frames from digital to 
analog format; and using the original frequency to playback 
the analog format signal. 

[0010] The method comprises receiving the encoded 
original audio signal; retrieving frames of the original 
audio signal; skipping frames at a rate according to a 
desired playback speed; applying a window function to the 
remaining frames; converting the signal with windowed frames 
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from digital to analog format; and using the original 
frequency to playback the analog format signal. 

[0011] In an embodiment of the present invention, the 
desired playback speed is a predefined default value. 

[0012] In another embodiment of the present invention, 
the desired playback speed is a programmable value. 

[0013] These and other features and advantages of the 
present invention may be appreciated from a review of the 
following detailed description of the present invention, 
along with the accompanying figures in which like reference 
numerals refer to like parts throughout. 



ev. 
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BRIEF DESCRIPTION OF SEVERAL VIEWS OF THE DRAWINGS 
[0014] Figure 1 illustrates a block diagram of an 
exemplary time-domain encoding of an audio signal, in 
accordance with an embodiment of the present invention. 

[0015] Figure 2 illustrates a block diagram of an 
exemplary time-domain decoding of an audio signal, in 
accordance with an embodiment of the present invention. 

[0016] Figure 3 illustrates a flow diagram of an 
exemplary method for time-domain decoding of an audio 
signal, in accordance with an embodiment of the present 
invention . 

[0017] Figure 4 illustrates a block diagram of an 
exemplary frequency- domain encoding of an audio signal, in 
accordance with an embodiment of the present invention. 

[0018] Figure 5 illustrates a block diagram of an 
exemplary frequency-domain decoding of an audio signal, in 
accordance with an embodiment of the present invention. 

[0019] Figure 6 illustrates a flow diagram of an 
exemplary method for frequency- domain decoding of an audio 

signal, in accordance with an embodiment of the present 
invention . 

[0020] Figure 7 illustrates a block diagram of an 

exemplary audio decoder, in accordance with an embodiment of 
the present invention. 
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DETAILED DESCRIPTION OF THE INVENTION 
[0021] The present invention relates generally to audio 
decoding. More specifically, this invention relates to 
decoding audio signals to obtain an audio signal at a faster 
speed while maintaining the same pitch as the original audio 
signal so the original signal sounds same without having 
noticeable change in the pitch. Although aspects of the 
present invention are presented in terms of a generic audio 
signal, it should be understood that the present invention 
may be applied to many other types of systems. 

[0022] Figure 1 illustrates a block diagram of an 
exemplary time-domain encoding of an audio signal 111, in 
accordance with an embodiment of the present invention. The 
audio signal 111 is captured and sampled to convert it from 
analog- to-digital format using, for example, an audio to 
digital converter (ADC) . The samples of the audio signal 
111 are then grouped into frames 113 (F 0 ...F n ) of 1024 
samples such as, for example, (F x (0) . . .F x (1023) ) . The 
frames 113 are then encoded according to one of many 
encoding schemes depending on the system. 

[0023] Figure 2 illustrates a block diagram of an 
exemplary time-domain decoding of an audio signal, in 
accordance with an embodiment of the present invention. In 
an embodiment of the present invention, the input to the 
decoder is frames 213 (F 0 ...F n ) of 1024 samples such as, for 
example, frames 113 (F 0 ...F n ) of 1024 samples of Fig. 1. 

[0024] The frames 213 (F 0 ...F n ) are then skipped at a 
rate consistent with the desired slow rate. For example, if 
the desired audio speed is twice the original speed, then 
every other frame is skipped, resulting in frames 212 
( FR 0 . . . FR m ) of 1024 samples, where FR 0 = F 0 , and FRi = F 2 , 



etc. Additionally, m depends on the desired fast rate. In 
the example, where the desired audio speed is twice the 
original speed, m = n/2 . If, for example, the desired audio 
speed is three times the original speed, then every third 
frame is played back, and the two consecutive frames in 
between are skipped, so frames 213 (F 0 ...F n ) result in 
frames 212 (FR 0 ...FR m ), where FR 0 = F 0 , FR X = F 3 , FR 3 = F 6 , FR 4 
= F 9 , etc., and m = n/3 . 

[0025] A window function WF is then applied to frames 212 
(FR 0 ...FR m ) to "smooth out" the samples and ensure that the 
resulting signal does not have any artifacts that may result 
from skipping frames. The window function results in the 
windowed frames 214 (WF 0 . . . WF L ) of 1024 samples. The 
window function WF can be one of many widely known and used 
window functions, or can be designed to accommodate the 
design requirements of the system. 

[0026] The windowed frames 214 (WF 0 ... WF L ) of 1024 
samples are then run through a digital-to-analog converter 
(DAC) to get an analog signal 201. The analog signal 211 is 
a shorter version of the analog input signal 111 of Fig. 1 
(analog signal 211 and analog signal 111 are not equal) . 
When the analog signal 211 is played at the same frequency 
as the original signal 111 of Fig. 1, the speed, in the 
example with skipping every other frame, is effectively 
twice the speed at which the original audio was but the 
pitch remains the same, since the playback frequency remains 
unchanged. Hence, achieving a faster audio playback without 
affecting the pitch. 

[0027] Figure 3 illustrates a flow diagram of an 

exemplary method for time-domain decoding of an audio 

signal, in accordance with an embodiment of the present 

invention. At a starting block 421, an input is received 
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from the encoder directly, using a storage device, or 
through a communication medium. The input, which is coming 
from the encoder, is frames (F 0 ...F n ). Then depending on 
the rate at which the audio signal needs to be sped up, the 
proper number of frames is skipped at a next block 423, as 
described above with reference to Fig. 2, resulting in the 
frames (FR 0 . . . FR m ) . 

[0028] At a next block 425, a window function WF is 
applied to the frames (FR 0 ...FR m ) to "smooth out" the 
samples and ensure that the resulting signal does not have 
any artifacts that may result from skipping frames. The 
window function results in the windowed frames (WF 0 ... WF L ) . 
The window function WF can be one of many widely knows and 
used window functions, or can be designed to accommodate the 
design requirements of the system. 

[0029] The windowed frames (WF 0 ... WF L ) are then sent 
through the DAC at a next block 42 7 to produce the audio 
signal at the desired fast speed, with the same pitch as the 
original because the playback frequency is kept the same as 
the original signal. 

[0030] Standards such as, for example, MPEG-1, Layer 3 
(MPEG stands for Motion Pictures Experts Group) , MPEG-4 
AAC (Advance Audio Coding) and Dolby-AC-3 decoders have been 
devised for compressing audio signals. In certain 
embodiments of the present invention, the audio signal can 
be compressed in accordance with such standards for 
compressing audio signals. 

[0031] Figure 4 illustrates a block diagram describing 
the encoding of an audio signal 101, in accordance withthe 
MPEG-1, layer 3 standard. The audio signal 101 is captured 
and sampled to convert it from analog- to-digital format 
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using, for example, an audio to digital converter (ADC) . 
The samples of the audio signal 101 are then grouped into 
frames 103 (F 0 ...F n ) of 1024 samples such as, for example, 
(F x (0) . . .F x (1023) ) . 

[0032] The frames 103 (F 0 ...F n ) are then grouped into 
windows 105 (W 0 ...W n ) each one of which comprises 2048 
samples or two frames such as, for example, 
(W x (0) . . .W x (2047) ) comprising frames (F x (0) . . . F x (1023) ) and 
(F x+ i (0) . . .F x+ i (1023) ) . However, each window 105 W x has a 50% 
overlap with the previous window 105 W x _!. Accordingly, the 
first 1024 samples of a window 105 W x are the same as the 
last 1024 samples of the previous window 105 W x _i. For 
example, W 0 = (W 0 (0) . . ,W 0 (2047) ) = (F 0 (0) . . . F 0 (1023) ) and 
(Fi(0) . . .Fi(1023)) , and W x = (W x (0) . . .W x (2047) ) 
(Fi(0) . . .F 1 (1023) ) and (F 2 (0) . . . F 2 (1023 ) ) . Hence, in the 
example, W 0 and W x contain frames (Fx ( 0) . . . F x ( 1023 ) ) . 

[0033] A window function w(t) is then applied to each 
window 105 (W 0 ...W n ), resulting in sets (wW 0 ...wW n ) of 2048 
windowed samples 107 such as, for example, 
(wW x (0) . . .wW x (2047) ) . A modified Discrete Cosine transform 
(MDCT) is then applied to each set (wW 0 ...wW n ) of windowed 
samples 107 (wW x (0) . . .wW x (2047) ) , resulting sets 
(MDCT 0 . . . MDCT n ) of 1024 frequency coefficients 109 such as, 
for example, (MDCT x (0) . . .MDCT X (1023) ) . A different transform 
like Fourier or Wavelet Transform can also be applied 
depending upon the audio signal qualities used during 
encoding . 

[0034] The sets of transform coefficients 109 

(MDCT 0 . . . MDCT n ) are then quantized and coded for 

transmission, forming an audio elementary stream (AES) . The 

AES can be multiplexed with other AESs. The multiplexed 

signal, known as the Audio Transport Stream (Audio TS) can 
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then be stored and/or transported for playback on a playback 
device. The playback device can either be at a local or 
remote located from the encoder. Where the playback device 
is remotely located, the multiplexed signal is transported 
over a communication medium such as, for example, the 
Internet. The multiplexed signal can also be transported to 
a remote playback device using a storage medium such as, for 
example, a compact disk. 

[0035] During playback, the Audio TS is de-multiplexed, 
resulting in the constituent AES signals. The constituent 
AES signals are then decoded, yielding the audio signal. 
During playback the speed of the signal may be increased to 
produce the original audio at a faster speed. 

[0036] Figure 5 is a block diagram describing the 
decoding of an audio signal, in accordance with another 
embodiment of the present invention. In an embodiment of 
the present invention, the input to the decoder is sets 
(MDCT 0 . . . MDCT n ) of 1024 frequency coefficients 209 such as, 
for example, the sets (MDCT 0 . . . MDCT n ) of 1024 frequency 
coefficients 109 of Fig. 4. An inverse modified discrete 
cosine transform (IMDCT) is applied to each set 
(MDCT 0 . . .MDCTn) of 1024 frequency coefficients 209. The 
result of applying the IMDCT is the sets (wW 0 ...wW n ) of 
windowed samples 207 (wW x (0) . . . wW x (2047) ) equivalent to sets 
(wW 0 . . .wW n ) of windowed samples 107 (wW x (0) . . .wW x (2047) ) of 
Fig. 4. 

[0037] An inverse window function w x (t) is then applied 

to each set (wW 0 ...wW n ) of 2048 windowed samples 207, 

resulting in windows 205 (W 0 ...W n ) each one of which 

comprises 2048 samples. Each window 205 (W 0 ...W n ) comprises 

2048 samples from two frames such as, for example, 

(W x (0) . . .W x (2047) ) comprising frames (F x (0) . . . F x (1023) ) and 
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(F x+ i (0) . . .F x+ i (1023) ) as illustrated in Fig. 4. The frames 

203 (F 0 ...F n ) of 1024 samples such as, for example, 
(F x (0) . . . F x (1023) ) , are then extracted from the windows 205 
(W 0 ...W n ). Commonly known windows such as, for example, 

Hanning, Hamming, Blackman, Gaussian or Kaiser can be used. 

Additionally, a user-defined window can also be used 

depending on the requirements. 

[0038] The frames 203 (F 0 ...F n ) are then skipped at a 
rate consistent with the desired slow rate. For example, if 
the desired audio speed is twice the original speed, then 
every other frame is skipped, resulting in frames 202 

(FRo-.-FRm) of 1024 samples, where FR 0 = F 0/ and FR X = F 2 , 
etc. Additionally, m depends on the desired fast rate. In 
the example, where the desired audio speed is twice the 
original speed, m = n/2. If, for example, the desired audio 
speed is three times the original speed, then every third 
frame is played back, and the two in between are skipped, so 
frames 203 (F 0 ...F n ) result in frames 202 (FR 0 ...FR m ), where 
FR 0 = F 0 , FRi = F 3 , FR 3 = F 6 , FR 4 = F 9 , etc., and m = n/3 . 

[0039] A window function WF is then applied to frames 202 
(FR 0 ...FR m ) to "smooth out" the samples and ensure that the 
resulting signal does not have any artifacts that may result 
from skipping frames. The window function results in the 
windowed frames 204 (WF 0 ... WF L ) of 1024 samples. The 
window function WF can one of many widely knows and used 
window functions, or can be designed to accommodate the 
design requirements of the system. 

[0040] The windowed frames 204 (WF 0 ... WF L ) of 1024 

samples are then run through a digital-to-analog converter 

(DAC) to get an analog signal 201. The analog signal 201 is 

a shorter version of the analog input signal 101 of Fig. 4 

(analog signal 201 and analog signal 101 are not equal) . 
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When the analog signal 201 is played at the same frequency 
as the original signal 101 of Fig. 4, the speed, in the 
example with skipping every other frame, is effectively 
twice the speed at which the original audio was but the 
pitch remains the same, since the playback frequency remains 
unchanged. Hence, achieving a faster audio playback without 
affecting the pitch. 

[0041] Figure 6 illustrates a flow diagram of an 
exemplary method for frequency-domain decoding of an audio 
signal, in accordance with an embodiment of the present 
invention. At a starting block 401, an input is received 
from the encoder directly, using a storage device, or 
through a communication medium. The input, which is coming 
from the encoder, is quantized and coded sets of frequency 
coefficients of a MDCT (MDCT 0 . . .MDCT n ) . At a next block 403 
the input is inverse modified discrete cosine transformed, 
yielding sets (wW 0 ...wW n ) of 2048 windowed samples. An 
inverse window function is then applied to the windowed 
samples at a next block 405 producing the windows (W 0 ...W n ) 
each of which comprises 2048 samples. The windows are the 
result of overlapping frames (F 0 ...F n ), which may be 
obtained by inverse overlapping the windows (W 0 ...W n ) at a 
next block 407. Then depending on the rate at which the 
audio signal needs to be sped up, the proper number of 
frames is skipped at a next block 409, as described above 
with reference to Fig. 5, resulting in the frames 

(FR 0 . . .FRJ . 

[0042] At a next block 410, a window function WF is 
applied to the frames (FR 0 ...FR m ) to ''smooth out" the 
samples and ensure that the resulting signal does not have 
any artifacts that may. result from skipping frames. The 
window function results in the windowed frames (WF 0 ... WF L ) . 
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The window function WF can one of many widely knows and used 
window functions, or can be designed to accommodate the 
design requirements of the system. 

[0043] The windowed frames (WF 0 ... WF L ) are then sent 
through the DAC at a next block 411 to produce the audio 
signal at the desired fast speed, with the same pitch as the 
original because the playback frequency is kept the same as 
the original signal. 

[0044] Figure 7 illustrates a block diagram of an 
exemplary audio decoder, in accordance with an embodiment of 
the present invention. The encoded audio signal is 
delivered from signal processor 3 01, and the advanced audio 
coding (AAC) bit-stream 303 is de-multiplexed by a bit- 
stream de-multiplexer 305. This includes Huffman decoding 
307, scale factor decoding 311, and decoding of side 
information used in tools such as mono/stereo 313, intensity 
stereo 317, TNS 319, and the filter bank 321. 

[0045] The sets of frequency coefficients 109 
(MDCT 0 . . . MDCT n ) of Fig. 4 are decoded and copied to an 
output buffer in a sample fashion. After Huffman decoding 
307, an inverse quantizer 309 inverse quantizes each set of 
frequency coefficients 109 (MDCT 0 . . . MDCT n ) by a 4/3-power 
nonlinearity . The scale factors 311 are then used to scale 
sets of frequency coefficients 109 (MDCT 0 . . . MDCT n ) by the 
quantizer step size. 

[0046] Additionally, tools including the mono/stereo 313, 
prediction 315, intensity stereo coupling 317, TNS 319, and 
filter bank 321 can apply further functions to the sets of 
frequency coefficients 109 (MDCT 0 . . .MDCT n ) . The gain 

control 323 transforms the frequency coefficients 109 

(MDCT 0 . . .MDCTn) into a time-domain audio signal. The gain 
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control 323 transforms the frequency coefficients 109 by 
applying the IMDCT, the inverse window function, and inverse 
window overlap as explained above in reference to Fig. 5. 
If the signal is not compressed, then the IMDCT, the inverse 
window function, and the inverse window overlap steps are 
skipped, as shown in Fig. 2. 

[0047] The output of the gain control 323, which is 
frames (F 0 ...F n ) such as, for example, frames 203 or frames 
213, is then sent to the audio processing unit 325 for 
additional processing, playback, or storage. The audio 
processing unit 325 receives an input from a user regarding 
the speed at which the audio signal should be played or has 
access to a default value for the factor of speeding up the 
audio signal at playback. The audio processing unit 325 
then processes the audio signal according to the factor for 
fast playback by skipping frames from the frames (F 0 ...F n ) 
at a rate consistent with the desired fast rate. For 
example, if the desired audio speed is twice the original 
speed, then every other frame is skipped, resulting in 
frames (FR 0 ...FR m ) such as, for example, frames 202 or 
frames 212, of 1024 samples, where FR 0 = F 0 , and FR X = F 2 , 
etc. Additionally, m depends on the desired fast rate. In 
the example, where the desired audio speed is twice the 
original speed, m = n/2. If, for example, the desired audio 
speed is three times the original speed, then every third 
frame is played back, and the two in between are skipped, so 
frames (F 0 ...F n ) result in frames (FR 0 ...FR m ), where FR 0 = F 0 , 
FR X = F 3 , FR 3 = F 6 , FR 4 = F 9 , etc., and m = n/3 

[0048] A window function WF is then applied to frames 
(FR 0 ...FR m ) to "smooth out" the samples and ensure that the 
resulting signal does not have any artifacts that may result 
from skipping frames. The window function results in the 
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windowed frames (WF 0 ... WF L ) such as, for example, frames 
204 or frames 214, of 1024 samples. The window function WF 
can be one of many widely knows and used window functions, 
or can be designed to accommodate the design requirements of 
the system. 

[0049] At this point the signal is still in digital form, 
so the output of the audio processing unit 325 is run 
through a DAC 327, which converts the digital signal to an 
analog audio signal to be played through a speaker 329. 

[0050] In an embodiment of the present invention, the 
playback speed is pre-determined in the design of the 
decoder. In another embodiment of the present invention, 
the play back speed is entered by a user of the decoder, and 
varies accordingly . 

[0051] While the present invention has been described 
with reference to certain embodiments, it will be understood 
by those skilled in the art that various changes may be made 
and equivalents may be substituted without departing from 
the scope of the present invention. In addition, many 
modifications may be made to adapt a particular situation or 
material to the teachings of the present invention without 
departing from its scope. Therefore, it is intended that 
the present invention not be limited to the particular 
embodiment disclosed, but that the present invention will 
include all embodiments falling within the scope of the 
appended claims. 
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